Skip to content

Conversation

JoaoJandre
Copy link
Contributor

Description

On KVM, there are two types of snapshots: internal and external. Most snapshot/backup solutions use external snapshots on ACS; save for disk-and-memory VM snapshots, which use internal snapshots (this is a limitation with KVM, as far as I know).

However, since internal snapshots are stored inside the VM's volume (hence the name), if an internal snapshot is taken after an external snapshot and the external snapshot is restored, the internal snapshot is lost.

Thus, this PR blocks the use of disk-and-memory VM snapshots alongside volume snapshots, NAS backups, and disk-only VM snapshots (at least the ones created using the default volume snapshot implementation).

I encourage maintainers of 3rd party storage providers to test if their implementation is compatible with disk-and-memory VM snapshots, if it is not it their simultaneous usage should be blocked.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I created a VM and created a few disk-and-memory VM snapshots on it; then I tried to create NAS backups, volume snapshots and disk-only VM snapshots, all of them gave me an error, which is expected.

I validated that the opposite was also true for the aforementioned cases, e.g., create volume snapshot and try to create disk-and-memory VM snapshot.

I also validated that it was possible to create multiple NAS backups, disk-only VM snapshots and volume snapshots with no issues.

@JoaoJandre
Copy link
Contributor Author

@slavkap @rp- I think it would be interesting to validate if the implementations done for Storpool and Linstor are compatible with disk-and-memory VM snapshots.

Copy link

codecov bot commented Jun 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.63%. Comparing base (574ed78) to head (8938966).
⚠️ Report is 6 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (574ed78) and HEAD (8938966). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (574ed78) HEAD (8938966)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               main   #11039       +/-   ##
=============================================
- Coverage     17.36%    3.63%   -13.73%     
=============================================
  Files          5888      441     -5447     
  Lines        525737    37019   -488718     
  Branches      64164     6785    -57379     
=============================================
- Hits          91274     1345    -89929     
+ Misses       424163    35513   -388650     
+ Partials      10300      161    -10139     
Flag Coverage Δ
uitests 3.63% <ø> (-0.01%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JoaoJandre
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13798

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13809

@DaanHoogland
Copy link
Contributor

@JoaoJandre :

08:35:48 [ERROR] /jenkins/workspace/acs-centos8-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.2.0-SNAPSHOT/engine/storage/snapshot/src/test/java/org/apache/cloudstack/storage/vmsnapshot/VMSnapshotStrategyKVMTest.java:32:8: Unused import - org.apache.cloudstack.backup.dao.BackupDao. [UnusedImports]

@JoaoJandre
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13818

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13826

@rp-
Copy link
Contributor

rp- commented Jul 3, 2025

Linstor does currently not support memory snapshots (we check and throw an error if selected).
So I guess we are currently not affected by any of this?

Copy link
Contributor

@slavkap slavkap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM
I haven't tested it with NFS, but the StorPool smoke tests are executed successfully

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13723)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53428 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11039-t13723-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@weizhouapache
Copy link
Member

@JoaoJandre
this seems to be included in PR #10632
do you still want it in 4.20.2 ?

@JoaoJandre
Copy link
Contributor Author

@JoaoJandre this seems to be included in PR #10632 do you still want it in 4.20.2 ?

@weizhouapache PR #10632 blocks the usage of the feature introduced in #10632 and other incompatible features. This PR purposefully ignores #10632 and adds restrictions to avoid other interactions between internal and external snapshots; such as volume snapshot and disk-and-memory VM snapshot.

They are complementary. When merging this PR forward, care should be taken so that the validations of both PRs do not erase one another (I can make the merge forward if needed).

@weizhouapache
Copy link
Member

weizhouapache commented Aug 28, 2025

@JoaoJandre this seems to be included in PR #10632 do you still want it in 4.20.2 ?

@weizhouapache PR #10632 blocks the usage of the feature introduced in #10632 and other incompatible features. This PR purposefully ignores #10632 and adds restrictions to avoid other interactions between internal and external snapshots; such as volume snapshot and disk-and-memory VM snapshot.

They are complementary. When merging this PR forward, care should be taken so that the validations of both PRs do not erase one another (I can make the merge forward if needed).

ok @JoaoJandre
I think the best option might be re-target this PR to 4.22 which includes #10632 , to avoid re-work.

@DaanHoogland
Copy link
Contributor

aren't we talking 4.20.2 , @weizhouapache ?

@weizhouapache
Copy link
Member

aren't we talking 4.20.2 , @weizhouapache ?

sorry, I meant 4.22, not 4.21

if we merge into 4.20.2, the merge forward to 4.22 will be a trouble , as @JoaoJandre mentioned
unless we ignore this PR in merge forward, and @JoaoJandre create another PR against 4.22 (needs re-review and re-testing)

@JoaoJandre
Copy link
Contributor Author

@DaanHoogland @weizhouapache I rebased the changes so now I'm targeting main.

@weizhouapache
Copy link
Member

@JoaoJandre
thanks for the update, overall LGTM.
left a small comment

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

not tested

@weizhouapache weizhouapache modified the milestones: 4.20.2, 4.22.0 Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants